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Vandercappelle, P. G. Kjeldsberg * 

April 2001 ACM Transactions on Design Automation of Electronic Systems (TODAES), 
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Additional Information: fylLcltation, abstract, .references, citings, index 
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We present a survey of the state-of-the-art techniques used in performing data and 
memory- related optimizations in embedded systems. The optimizations are targeted directly 
or indirectly at the memory subsystem, and impact one or more out of three important cost 
metrics: area, performance, and power dissipation of the resulting implementation. We first 
examine architecture-independent optimizations in the form of code transoformations. We 
next cover a broad spectrum of optimizati ... 

Keywords: DRAM, SRAM, address generation, allocation, architecture exploration, code 
transformation, data cache, data optimization, high-level synthesis, memory architecture 
customization, memory power dissipation, register file, size estimation, survey 
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L. Wang, C. L. Wu 

November 1988 Proceedings of the 1988 ACM/IEEE conference on Supercomputing 

Full text available: * Q pdff978. 12 KB) Additional Information: full citation , abstract , references , index terms 

Conventional instruction issuing methods use hardware control mechanism to issue 
instructions in multiple-functional-unit systems. They reach physical limitations due to the 
complexity of issuing logic when they intend to issue multiple instructions per cycle. A new 
method, I-NET, is presented in this paper to overcome this shortcoming. I-NET uses a post- 
compiler to detect the data dependencies among instructions. The detected data 
dependence is then attached to the instruction code to form ... 
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Volume 5 Issue 2 

i ui -, oc 00 Additional Information: full citation, abstract, references, cltincss, index 
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This tutorial surveys design methods for energy-efficient system-level design. We consider 
electronic sytems consisting of a hardware platform and software layers. We consider the 
three major constituents of hardware that consume energy, namely computation, 
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communication, and storage units, and we review methods of reducing their energy 
consumption. We also study models for analyzing the energy cost of software, and methods 
for energy-efficient software design and compilation. This survery ... 
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Hiroaki Hirata, Kozo Kimura, Satoshi Nagamine, Yoshiyuki Mochizuki, Akio Nishimura, 
Yoshimori Nakase, Tejji Nishizawa 

April 1992 ACM SIGARCH Computer Architecture News , Proceedings of the 19th 

annual international symposium on Computer architecture, Volume 20 issue 2 

Additional Information: fyij.cjtation, abstract, references, citings, index 



Full text available: fit pdffl. 03 MB) 
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In this paper, we propose a multithreaded processor architecture which improves machine 
throughput. In our processor architecture, instructions from different threads (not a single 
thread) are issued simultaneously to multiple functional units, and these instructions can 
begin execution unless there are functional unit conflicts. This parallel execution scheme 
greatly improves the utilization of the functional unit. Simulation results show that by 
executing two and four threads in parallel ... 



M. D. Smith, M. Johnson, M. A. Horowitz 

April 1989 ACM SIGARCH Computer Architecture News , Proceedings of the third 
international conference on Architectural support for programming 
languages and operating systems, volume 17 issue 2 

Full text available- m odf(1 56 MB: Additional Information: MLcMion, abstract, references, citings, .index 

^ terms 

This paper investigates the limitations on designing a processor which can sustain an 
execution rate of greater than one instruction per cycle on highly-optimized, non-scientific 
applications. We have used trace-driven simulations to determine that these applications 
contain enough instruction independence to sustain an instruction rate of about two 
instructions per cycle. In a straightforward implementation, cost considerations argue 
strongly against decoding more than two instructions in ... 
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Theo Lingerer, Borut Robic, Jurij Silc 

March 2003 ACM Computing Surveys (CSUR), volume 35 issue 1 

Full text available: pdf(920.16 KB; Additional Information: full citation , abstract , references , index terms 

Hardware multithreading is becoming a generally applied technique in the next generation of 
microprocessors. Several multithreaded processors are announced by industry or already 
into production in the areas of high-performance microprocessors, media, and network 
processors. A multithreaded processor is able to pursue two or more threads of control in 
parallel within the processor pipeline. The contexts of two or more threads of control are 
often stored in separate on-chip register sets. Unused i ... 

Keywords: Blocked multithreading, interleaved multithreading, simultaneous 
multithreading 



Critical issues regarding HPS, a high performance microarchitecture 
Y. N. Patt, S. W. Melvin, W. M. Hwu, M. C. Shebanow 

December 1985 ACM SIGMICRO Newsletter , Proceedings of the 18th annual workshop 

on Microprogramming, volume 16 issue 4 
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HPS is a new model for a high performance microarchitecture which is targeted for 
implementing very dissimilar ISP architectures. It derives its performance from executing 
the operations within a restricted window of a program out-of-order, asynchronously, and 
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concurrently whenever possible. Before the model can be reduced to an effective working 
implementation of a particular target architecture, several issues need to be resolved. This 
paper discusses these issues, both in general and in ... 
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September 2001 Journal on Educational Resources in Computing (JERIC) 

Full text available: ■f| | pdf(613.53 KB) 
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Keywords: bitline segmentation, low power comparator, low power instruction scheduling, 
low-power superscalar datapath 
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Hideki Ando, Chikako Nakanishi, Tetsuya Hara, Masao Nakaya 

May 1995 ACM SIGARCH Computer Architecture News , Proceedings of the 22nd 

annual international symposium on Computer architecture, volume 23 issue 2 

Full text available: m odlM 50 MB) Additional ,nformatjon: fuii citetion - references , cliings, index 

^ v terms 

Speculative execution is execution of instructions before it is known whether these 
instructions should be executed. Compiler-based speculative execution has the potential to 
achieve both a high instruction per cycle rate and high clock rate. Pure compiler-based 
approaches, however, have greatly limited instruction scheduling due to a limited ability to 
handle side effects of speculative execution. Significant performance improvement is, thus, 
difficult in non-numerical applications. This paper ... 

11 Empirical performance evaluation of concurrency and coherency control protocols for 

database sharing systems 
Erhard Rahm 

June 1993 ACM Transactions on Database Systems (TODS), volume 18 Issue 2 

Full text available: ■B g | pdff3.37 MB! Additional Information: M.c!tatjon, abstract, references, citings, jndex 
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Database Sharing (DB-sharing) refers to a general approach for building a distributed high 
performance transaction system. The nodes of a DB-sharing system are locally coupled via a 
high-speed interconnect and share a common database at the disk level. This is also known 
as a "shared disk" approach. We compare database sharing with the database partitioning 
(shared nothing) approach and discuss the functional DBMS components that require new 
and coordinated solutions for DB-shar ... 

Keywords: coherency control, concurrency control, database partitioning, database 
sharing, performance analysis, shared disk, shared nothing, trace-driven simulation 
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Embedded systems often include a traditional processor capable of executing sequential 
code, but both control and data-dominated tasks are often more naturally expressed using 
one of the many domain-specific concurrent specification languages. This article surveys a 
variety of techniques for translating these concurrent specifications into sequential code. The 
techniques address compiling a wide variety of languages, ranging from dataflow to Petri 
nets. Each uses a different method, to some degr ... 

Keywords: Compilation, Esterel, Lustre, Petri nets, Verilog, code generation, 
communication, concurrency, dataflow, discrete-event, partial evaluation, sequential 
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In highly-pipelined machines, instructions and data are prefetched and buffered in both the 
processor and the cache. This is done to reduce the average memory access latency and to 
take advantage of memory interleaving. Lock-up free caches are designed to avoid 
processor blocking on a cache miss. Write buffers are often included in a pipelined machine 
to avoid processor waiting on writes. In a shared memory multiprocessor, there are more 
advantages in buffering memory requests, since each m ... 
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Chih-Po Wen 

April 1992 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: 
technological challenges of the 1990's 
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Full text available: f?|pdf(4.67 MB\ : : ^ 
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Intergraph's CLIPPER microprocessor is a high performance, three chip module that 
implements a new instruction set architecture designed for convenient programmability, 
broad functionality, and easy future expansion. 
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annual international symposium on Computer Architecture, volume is issue 3 
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This paper describes the architecture for issuing multiple instructions per clock in the 
NonStop Cyclone Processor. Pairs of instructions are fetched and decoded by a dual two- 
stage prefetch pipeline and passed to a dual six-stage pipeline for execution. Dynamic 
branch prediction is used to reduce branch penalties. A unique microcode routine for each 
pair is stored in the large duplexed control store. The microcode controls parallel data paths 
optimized for executing the most frequent instr ... 
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